Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version

Keep Document Parts (Text Processing)

Synopsis

Extracts the text of a token that matches a given regular expression and returns it.

Description

This operator allows to extract a part of a token using regular expressions. It searches the first region within the text that matches the given regular expression and returns this region as new token. If no such region can be found this token is discarded. Since this probably will work best when the tokens are long enough, this operator is especially useful before the actual tokenization is applied during word vector creation.

Input

  • document

    The document port.

Output

  • document

    The document port.

Parameters

  • extraction_regexThis regular expression specifies the part of the string, which is extracted and returned. Range: